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' Abstract 

Given a universal gate set on two qubits, it is well known that applying random gates from the 
set to random pairs of qubits will eventually yield an approximately Haar-distributed unitary, 
i— - n , However, this requires exponential time. We show that random circuits of only polynomial 

• length will approximate the first and second moments of the Haar distribution, thus forming 

approximate 1- and 2-designs. Previous constructions required longer circuits and worked only 
' for specific gate sets. As a corollary of our main result, we also improve previous bounds on the 

^ . convergence rate of random walks on the Clifford group. 

=5 

1 Introduction: Pseudo-Random Quantum Circuits 

m ■ 

There are many examples of algorithms that make use of random states or unitary operators 
t— I , (e.g. [E,l28]). However, exactly sampling from the uniform Haar distribution is inefficient. In many 

cases, though, only pseudo-random operators are required. To quantify the extent to which the 
pseudo-random operators behave like the uniform distribution we use the notion of k-designs (often 
referred to as i-designs). A A;-design has k th moments equal to those of the Haar distribution. For 
most uses of random states or unitaries, this is sufficient. Constructions of exact fc-designs on 
states are known (see [3] and references therein) and some are efficient. Ambainis and Emerson 
■ [3] introduced the notion of approximate state /c-designs, which can be implemented efficiently 

for any k. However, the known constructions of unitary fc-designs are inefficient to implement. 
Approximate unitary 2-designs have been considered [HI QUI EE], although the approaches are 
specific to 2-designs. 

We consider a general class of random circuits where a series of two-qubit gates are chosen from 
a universal gate set. We give a framework for analysing the k th moments of these circuits. Our 
conjecture, based on an analogous classical result [23j, is that a random circuit on n qubits of length 
poly(n, k) is an approximate fc-design. While we do not prove this, we instead give a tight analysis 
of the k = 2 case. We find that in a broad class of natural random circuit models (described in 
Section [I. ip . a circuit of length 0(n(n + log 1/e)) yields an e-approximate 2-design. Our definition 
of an approximate £>design is in Section [2"T2l Our results also apply to an alternative definition of an 
approximate 2-design from [10], for which we show random circuits of length 0(n(n + log 1/e)) yield 
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e- approximations, thus extending the results of that paper to a larger class of circuits. Moreover, 
our results also apply to random stabiliser circuits, meaning that a random stabiliser circuit of 
length 0(n(n + logl/e)) will be an e-approximate 2-design. This both simplifies the construction 
and tightens the efficiency of the approach of [14], which constructed e-approximate 2-designs in 
time 0(n 6 (n 2 + logl/e)) using 0(n 3 ) elementary quantum gates. 

1.1 Random Circuits 

The random circuit we will use is the following. Choose a 2-qubit gate set that is universal on 
£7(4) (or on the stabiliser subgroup of U(4)). One example of this is the set of all one qubit gates 
together with the controlled-NOT gate. Another is simply the set of all of U{A). Then, at each 
step, choose a random pair of qubits and apply a gate from the universal set chosen uniformly at 
random. For the f7(4) case, the distribution will be the Haar measure on U{A). One such circuit 
is shown in Fig. Q] for n = 4 qubits. This is based on the approach used in Refs. [9] but our 
analysis is both simpler and more general. 




H H 

Figure 1: An example of a random circuit. Different lines indicate a different gate is applied at 
each step. 



Since the universal set can generate the whole of U(2 n ) in this way, such random circuits can 
produce any unitary. Further, since this process converges to a unitarily invariant distribution and 
the Haar distribution is unique, the resulting unitary must be uniformly distributed amongst all 
unitaries [15] . Therefore this process will eventually converge to a Haar distributed unitary from 
U(2 n ). This is proven rigour ously in Lemma 13.71 However, a generic element of U{2 n ) has 4 n real 
parameters, and thus to even have f2(4~ n ) fidelity with the Haar distribution requires J7(4 n ) 2-qubit 
unitaries. We address this problem by considering only the lower-order moments of the distribution 
and showing these are nearly the same for random circuits as for Haar-distributed unitaries. This 
claim is formally described in Theorem 12. 101 

Our paper is organised as follows. In Section [2] we define unitary A:-designs and explain how a 
random circuit could be used to construct a /c-design. In Section [3] we work out how the state 
evolves after a single step of the random circuit. We then extend this to multiple steps in Section [J] 
and prove our general convergence results. A key simplification will be (following [26]) to map the 
evolution of the second moments of the quantum circuit onto a classical Markov chain. We then 
prove a tight convergence result for the case where the gates are chosen from U{A) in Section [5] 
This section contains most of the technical content of the paper. Using our bounds on mixing time 
we put together the proof that random circuits yield approximate unitary 2-designs in Section [6j 
Section [7] concludes with some discussion of applications. 
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2 Preliminaries 



2.1 Pauli expansion 

Much of the following will be done in the Pauli basis. The Pauli operators will be taken as 
{o"o, 0\ y (T%, 03} and defined to be 

a ° = (0 1) ai= (? J) u%= \i ~o) a3= (o -l) 

If \tp} € C 2 ™ is a state on n qubits then we write tp = \ip)(i/j\. We can expand ifi in the Pauli basis 
as 

^ = 2-«/ 2 ^ 7 (p) ( T p (2.1) 
v 

where a p = a Pl (g) ■ ■ ■ <8> <r Pn for the string p = pi . . . p n . Inverting, the coefficients 7(p) are given by 

1 (p) = 2- n lhTa p i>. (2.2) 

It is easy to show that the coefficients j(p) are real and, with the chosen normalisation, the squares 
sum to tr^ 2 , which is 1 for pure ip. In general 

5> 2 (p)<i 

p 

with equality if and only if ijj is pure. Note also that tr -0 = 1 is equivalent to 7(0) = 2~™/ 2 . 

This notation is extended to states on nk qubits by treating 7 as a function of k strings from 
{0, 1, 2, 3} n . Thus a state p on nk qubits is written as 

p = 2 -nk/2 ^ 7o(pi,...,p*)op 1 ®...®oj Ifc . (2.3) 

Pl,—,Pk 

2.2 A;-designs 

We will say that a fc-design is efficient if the effort required to sample a state or unitary from the 
design is polynomial in n and k. Note that we do not require the number of states to be polynomial 
because, even for approximate unitary designs, an exponential number of unitaries is required. 
Rather, the number of random bits needed to specify an element of the design should be poly(n, k). 

2.2.1 State designs 

A (state) /c-design is an ensemble of states such that, when one state is chosen from the ensemble and 
copied k times, it is indistinguishable from a uniformly random state. This is a way of quantifying 
the pseudo-randomness of the state and is a quantum analogue of /c-wise independence. Hayashi 
et al. [20] give an inefficient construction of fc-designs for any n and k. 

The state /c-design definition we use is due to Ref. [3]: 
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Definition 2.1. An ensemble of quantum states {pi, \ipi)} is a state k-design if 

E«(l^>W fc = / m(Mf k d^ (2.4) 

• H 

where the integration is taken over the left invariant Haar measure on the unit sphere in C d , 
normalised so that J^dip = 1. 

It is well known that the above integral is equal to T^fex , where H+k is the projector onto the 

\ k ) 

symmetric subspace of k d-dimensional spaces. For a rigourous proof, see Ref. [16] and for a less 
precise proof but from a quantum information perspective see Ref. [7] . 

2.2.2 Unitary designs 

A unitary ^-design is, in a sense, a stronger version of a state design. Just as applying a Haar- 
random unitary to an arbitrary pure state results in a uniformly random pure state, applying a 
unitary chosen from a unitary /c-design to an arbitrary pure state should result in a state /c-design. 
Another way to say this is that the state obtained by acting U® k , where U is drawn from a unitary 
fc-design on U{d), on any <i fc -dimensional state should be indistinguishable from the case where U 
is drawn uniformly from U(d). Formally, we have: 

Definition 2.2. Let {pi,Ui} be an ensemble of unitary operators. Define 

Gw(p) = Y.P* U ? k p( U h m (2-5) 

i 

and 

g H {p)= [ U® k p(rf)® k dU. (2.6) 
Ju 

Then the ensemble is a unitary k-design iff Qw = Gh ■ 

Unitary designs can also be defined in terms of polynomials, so that if p is a polynomial with 
degree k in the matrix elements of U and k in the matrix elements of U*, then averaging p over 
a unitary fc-design should give the same answer as averaging over the Haar measure. To see the 
equivalence with Definition 12.21 note that averaging a monomial over our ensemble can be expressed 
as (h, . . . ,ik\Gw(\ji, ■ ■ ■ , jk){j[, ■ ■ ■ , j' k \)\i'i, ■ ■ ■ ,i' k ), and so if Q w = Qu then any polynomial of 
degree k will have the same expectation over both distributions. 

2.3 Approximate /c-designs 
2.3.1 Approximate state designs 

Numerous examples of exact efficient state 2-design constructions are known (e.g. [8]) but general 
exact constructions are not efficient in n and k. Approximate state designs were first introduced 
by Ambainis and Emerson [3] and they constructed efficient approximate state /c-designs for any k. 
Aaronson |Tj also gives an efficient approximate construction. 

We define approximate state designs as follows. 
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Definition 2.3. An ensemble of quantum states {pi, \ipi)} is an e- approximate state k-design if 



In [3] , a similar definition was proposed but with the additional requirement that the ensemble also 
forms a 1-design (exactly), i.e. 



This requirement was necessary there only so that a suitably normalised version of the ensemble 
would form a POVM. We will not use it. 

By taking the partial trace one can show that a ^-design is a &/-design for k' < k. Thus approximate 
/c-designs are always at least approximate 1-designs. 

2.3.2 Approximate unitary designs 

It was shown in Ref. [4) that a quantum analogue of a one time pad requires 2n bits to exactly 
randomise an n qubit state. However, in Ref. [5] it was shown that n + o(n) bits suffice to do 
this approximately. Translated into A:-design language, this says an exact unitary 1-design requires 
2 2n unitaries but can be done approximately with 2 n+ °( n h So approximate designs can have fewer 
unitaries than exact designs. Here, we are interested in improving the efficiency of implementing 
the unitaries. There are no known efficient exact constructions of unitary fe-designs; it is hoped 
that our approach will yield approximate unitary designs efficiently. 

We will require approximate unitary A:-designs to be close in the diamond norm [24\ : 
Definition 2.4. The diamond norm of a superoperator T 



where idd is the identity channel on d dimensions. 

Operationally, the diamond norm of the difference between two quantum operations tells us the 
largest possible probability of distinguishing the two operations if we are allowed to have them act 
on part of an arbitrary, possibly entangled, state. In the supremum over ancilla dimension d, it 
can be shown that d never needs to be larger than the dimension of the system that T acts upon. 
The diamond norm is closely related to completely bounded norms (cb-norms), in that ||T|| is the 
cb-norm of T' and can also be interpreted as the L\ — ► L\ cb-norm of T itself [111 127] . 

We can now define approximate unitary A;-designs. 
Definition 2.5. Qyy is an e-approximate unitary k-design if 



(1-e) f (|^)(^|)® fc #< V^(|^)(^|f fe <(l + e) / (MM)® fc # (2.7) 






\\Gw - Qh\\ ^ e > 




where Qw and Qh are defined in Definition \2.i& 
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In Ref. [TO], they consider approximate twirling, which is implemented using an approximate 2- 
design. They give an alternative definition of closeness which is more convenient for this application: 



Definition 2.6 QlOj). Let {pi,Ui} be an ensemble of unitary operators. Then this ensemble is an 
e-approximate twirl if 



max 
A 



E w W(A(W^pW))W^ - EuU(A(U^ P U))U^ 



o " d? 



(2.9) 



where the first expectation is over W chosen from the ensemble and the second is the Haar average. 
The maximisation is over channels A and d is the dimension (2 n in our case). 



Our results work for both definitions with the same efficiency. 



2.4 Random Circuits as A;-designs 

If a random circuit is to be an approximate /c-design then Eqn. 12.81 must be satisfied where the Ui 
are the different possible random circuits. We can think of this as applying the random circuit not 
once but k times to k different systems. 

Suppose that applying t random gates yields the random circuit W. If W® k acts on an nA;-qubit 
state p, then following the notation of Eqn. 12.81 the resulting state is 

pw := W m p{W^) m = 2~ nk l 2 Y, Jo(Pi,---,Pk)Wo- pi W^®...®Wa Pk Wl (2.10) 

PU-,Pk 

For this to be a fc-design, the expectation over all choices of random circuit should match the 
expectation over Haar-distributed W E U(2 n ). 

We are now ready to state our main results. Our results apply to a large class of gate sets which 
we define below: 

Definition 2.7. Let £ = {pi,Ui} be a discrete ensemble of elements from U(d). Define an operator 
G £ by 

Gs :=Y,PiU? k ® {U*)® k (2.11) 
i 

More generally, we can consider continuous distributions. If p, is a probability measure on U (d) 
then we can define by analogy as 

:= [ dp{U)U® k (U*)® k (2.12) 

JU(d) 

Then £ (or p) is k-copy gapped if Gg (or G^) has only k\ eigenvalues with absolute value equal to 
1. 

For any discrete ensemble £ = {pi, Ui}, we can define a measure p = YliPi^Ui- Thus, it suffices to 
state our theorems in terms of p and G^. 

The condition on G^ in the above definition may seem somewhat strange. We will see in Section [3] 
that when d > k there is a £;!-dimensional subspace of (C d )® 2fc that is acted upon trivially by any 
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G^. Additionally, when p is the Haar measure on U(d) then G^ is the projector onto this space. 
Thus, the fc-copy gapped condition implies that vectors orthogonal to this space are shrunk by G^. 

We will see that G^ is fc-copy gapped in a number of important cases. First, we give a definition 
of universality that can apply not only to discrete gates sets, but to arbitrary measures on £7(4). 

Definition 2.8. Let fx be a distribution on £7(4). Suppose that for any open ball S C £7(4) there 
exists a positive integer I such that p* e (S) > 0. Then we say p is universal [for £7(4)/. 

Here p* e is the £-fold convolution of p with itself; i.e. 



When p is a discrete distribution over a set {£7j}, Definition 12.81 is equivalent to the usual definition 
of universality for a finite set of unitary gates. 

Theorem 2.9. The following distributions on £7(4) are k-copy gapped: 

(i) Any universal gate set. Examples are £7(4) itself, any entangling gate together with all single 
qubit gates, or the gate set considered in [26]. 

(ii) Any approximate (or exact) unitary k-design on 2 qubits, such as the uniform distribution 
over the 2-qubit Clifford group, which is an exact 2-design. 



Theorem 2.10. Let p be a 2-copy gapped distribution and W be a random circuit on n qubits 
obtained by drawing t random unitaries according to p and applying each of them to a random 
pair of qubits. Then there exists C (depending only on p) such that for any e > and any 
t > C(n(n + log 1/e)), Qw is an e-approximate unitary 2-design according to either Definition \2.5\ 
or Definition \2.6l 

To prove Theorem 12. 101 we show that the second moments of the random circuits converge quickly 
to those of a uniform Haar distributed unitary. For W a circuit as in Theorem l2.10l write r )w{piiP2) 
for the Pauli coefficients of pw = W® 2 p (W^)® . Then write jt(pi,P2) = ~&wlw(Pi,P2) where W 
is a circuit of length t. Then we have 

Lemma 2.11. Let \x and W be as in Theorem \2.1(A Let the initial state be p with 7o(p,p) > and 
J2 P lo(p,p) = 1 (f or example the state \ip){ip\ <8> \il>){ip\ for any pure state Then there exists a 

constant C (possibly depending on p) such that for any e > 




Proof. 



(i) This is proven in Lemma 13.71 

(ii) This follows straight from Definition 12.21 



□ 




(2.13) 



for t > Cn log 1/e. 
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(«) 



E 

P1.P2 
PlP2^ 0t > 



lt{pi,P2) ~ S pi 



P1P2 2 n(- 2 « + 1) 



< e 



(2.14) 



for t > Cn{n + logl/e) or, when p is the uniform distribution on U{4) or its stabiliser 
subgroup, t > Cnlog j. 

We can then extend this to all states by a simple corollary: 



Corollary 2.12. Let fi, W and jw be as in Lemma \2.11[ Then, for any initial state p = 
p 2 7o(Pi)P2)fpi <8> o~ P2 , there exists a constant C (possibly depending on p) such that for 
any e > 



ST I , \ r Ep^0 70(P,P) V „ 
2^ (7t(PliP2)-*l>ipp : I < e 



P1.P2 
P1P2^°° 



4 n - 1 



(2.15) 



for t > Cn(n + log 1/e). 



(a) 



E 

P1.P2 
P1P27^00 



lt(j>l,P2) ~ 



E p ^o7o(p,p) 



PiP2 4n _l 



< e 



(2.16) 



for t > Cn(n + log 1/e). 



By the usual definition of an approximate design (Definition 12. 5p . we only need convergence in the 
2-norm (Eqn. I2.15p . which is implied by 1-norm convergence (Eqn. 12. 16j) but weaker. However, 
Definition 12.61 which requires the map to be close to the twirling operation, requires 1-norm con- 
vergence (i.e. Eqn. 12.16]) . Thus, Theorem 12. 101 for Definition 12.51 follows from Corollary 12. 12f i) and 
Theorem 12.101 for Definition 12.61 follows from Corollary 12.12( h). Theorem 12.101 is proved in Section 
[6]and Corollary [H2] in Section H 

We note that, in the course of proving Lemma 12.111 we prove that the eigenvalue gap (defined in 
Section [4. 3ft of the Markov chain that gives the evolution of the 7(p,p) terms is 0(l/n). It is easy 
to show that this bound is tight for some gate sets. 

Related work: Here we summarise the other efficient constructions of approximate unitary 2-designs. 



The uniform distribution over the Clifford group on n qubits is an exact 2-design [14] . More- 
over, [H] described how to sample from the Clifford group using 0(n 8 ) classical gates and 
0(n 3 ) quantum gates. Our results show that applying 0{n{n + logl/e)) random two-qubit 
Clifford gates also achieve an e-approximate 2-design (although not necessarily a distribution 
that is within e of uniform on the Clifford group). 



Dankert et al. [10] gave a specific circuit construction of an approximate 2-design. To achieve 
small error in the sense of Definition 12.5] their circuits require the same 0(n(n + logl/e)) 
gates that our random circuits do. However, when we use Definition 12.61 the circuits from 
[10] only need 0(n log 1/e) gates while the random circuits analysed in this paper need to be 
length 0(n(n + log 1/e)). 
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• The closest results to our own are in the papers by Oliveira et al. [26, 9j, which considered 
a specific gate set (random single qubit gates and a controlled-NOT) and proved that the 
second moments converge in time 0{n 2 (n + logl/e)). Our strategy of analysing random 
quantum circuits in terms of classical Markov chains is also adapted from [26, 9j. In Section 
O we generalise this approach to analyse the /c th moments for arbitrary k. 

The main results of our paper extend the results of [261 M to a larger class of gate sets and 
improve their convergence bounds. Some of these improvements have been conjectured by 
[30J, which presented numerical evidence in support of them. 

3 Analysis of the Moments 

In order to prove our results, we need to understand how the state evolves after each step of the 
random circuit. In this section we consider just one step and a fixed pair of qubits. Later on we 
will extend this to prove convergence results for multiple steps with random pairs of qubits drawn 
at every step. We consider first the Haar distribution over the full unitary group and then will 
discuss the more general case of any 2-copy gapped distribution. 

In this section, we work in general dimension d and with a general Hermitian orthogonal basis 
do, . . . , <7(2 2 -i- Later we will take d to be either 4 or 2 n and the Oi to be Pauli matrices. However, 
in this section we keep the discussion general to emphasise the potentially broader applications. 

Fix an orthonormal basis for d X d Hermitian matrices: <to, . . . , cr,p_±, normalised so that tra p a q = 
d5p^ q . Let o"o be the identity. We need to evaluate the quantity 

El/ (U® k a pi ® . . . ® a Pk (tf)® k ) =: T(p) (3.1) 

where the expectation is over Haar distributed U G U(d). We will need this quantity in two cases. 
Firstly, for d = 2 n , these are the moments obtained after applying a uniformly distributed unitary 
so we know what the random circuit must converge to. Secondly, for d = 4, this tells us how a 
random f7(4) gate acts on any chosen pair. 

Call the quantity in Eqn. 13.11 T(p) (we use bold to indicate a /c-tuple of coefficients; take p = 
(pi, . . . ,Pk)) and write it in the o p basis as 

T(p)= J^qjp)^®...®^. (3.2) 
q 

Here, G(q; p) is the coefficient in the Pauli expansion of T(p) and we define G as the matrix with 
entries equal to G(q;p). We have left off the usual normalisation factor because, as we shall see, 
with this normalisation G is a projector. Inverting this, we have 

G(q; p) = cT fc tr (a gi ® . . . ® a qk T{p)) 

= rtt/tr (K ® - - - ® a qk )U® h (a pl ® • • • ® a Pk )(U^ k ) (3.3) 

Note that G is real since T and the basis are Hermitian. 

We can gain all the information we need about the Haar integral in Eqn. 13.11 with the following 
observations: 
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Lemma 3.1. T(p) commutes with U® k for any unitary U . 

Proof. Follows from the invariance of the Haar measure on the unitary group. □ 
Corollary 3.2. T(p) is a linear combination of permutations from the symmetric group 

Proof. This follows from Schur-Weyl duality (see e.g. [H>J). □ 

From this, we can prove that G is a projector and find its eigenvectors. 
Theorem 3.3. G is symmetric, i.e. G(q;p) = G(p;q). 

Proof. Follows from the invariance of the trace under cyclic permutations. □ 

Theorem 3.4. P n is an eigenvector of G with eigenvalue 1 for any permutation operator P^ i.e. 

y~] G(p; q)tr (a qi <g> . . . <g> o^P-k) = tr (a Pl (8) ... (8) o- pk P w ). 
q 

Further, any vector orthogonal to this set has eigenvalue 0. 

Proof. For the first part, 
£6(p;q)tr 



= d fc ^E c/ tr [a qi Ua Pl U ] J . . . tr (a qk U<T Pk tf) tr (a qi ®...®o qk P n 
q 

= cT fc tr P n Eu tr (^i^pi^) 0"<?i ® • ■ ■ ® X) tr (^k^P*^) <7 < 

Writing U^a p U in the a p basis, we find 

~ ^tr (a g Ua p uA a q = Ua p Ul 



(3.4) 



d 



Therefore Eqn. 13.41 becomes 

tr (p^EuU^a^U <g> . . . ® U^a Pk ll) = tr (a Pl ® . . . <g> c^P,,.) . 



For the second part, consider any vector v which is orthogonal to the permutation operators (we 
can neglect the complex conjugate because P n is real in this basis), i.e. 

tr (a qi <g> . . . <g> a qk P n ) u(q) = (3.5) 

q 

for any permutation tt. Then 

^ G(p; q)u(q) = cT fc ^ tr (<j gi ® . . . ® a gfc T(p)) u(q) 
q q 

which is zero since T(p) is a linear combination of permutations and v is orthogonal to this by 
Eqn. [33J □ 
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Theorem 3.5. G 2 = G, i.e. G(p; q!)G(q'; q) = G(p;q). 



Proof. Using Eqn. 13.31 

G(p; q')G(q'; q) = £ G(p; q')<T fe tr ® . . . ® ^T(q)) . 

q' q' 

From Corollary 13.21 T(q) is a linear combination of permutations. This implies, using Theorem 13.41 
that 

G(p; q')cT fe tr (a q[ ® . . . ® <r, ; T(q)) = cT fe tr (<r pi ® . . . ® tf Pfe T(q)) 

q' 

= G(p;q) 

as required. □ 
Corollary 3.6. G is a projector so has eigenvalues and 1. 

We now evaluate G and T for the cases of k = 1 and k = 2 since these are the cases we are interested 
in for the remainder of the paper. 

3.1 k = l 

The k = 1 case is clear: the random unitary completely randomises the state. Therefore all terms 
in the expansion are set to zero apart from the identity i.e. 

T( P ) = h P = ° (3.6) 

3.2 k = 2 

For k = 2, there are just two permutation operators, identity I and swap T. Therefore there are 
just two eigenvectors with non-zero eigenvalue (n > 1). In normalised form, taking them to be 
orthogonal, their components are 

/i (91,92) = d qi oS q2 o 

/2(gi,02) = d2 _ 1 ^giga(l - <W 

We will now prove three properties of G that we need: 

1. G(p 1 ,p 2 ;qi,q2) =0 ifpi or gi / g 2 - 

Proof. Consider the function 7(01,02) = o~q 1 a$q 2 b with a ^ b. This function has zero over- 
lap with the eigenvectors f\ and fi so it goes to zero when acted on by G. Therefore 
G(p\,p2',a,b) = 0. The claim follows from the symmetry property (Theorem 13. 3D . □ 
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With this we will write G(p;q) = G(pi,p2',qi,q2)- 

2. G(p;0) =S p0 . 

Proof. Let G act on eigenvector f\. 

3. G(p; a) = for a,p ^ 0. 



□ 



Proof. Let G act on the input 5 qa . This has zero overlap with f\ and overlap with 
h- □ 



Therefore we have 



G(pi,P2;qi,q2 

Since T(p 1 ,p 2 ) = J2 Ql , q2 G(Pi,P2]qi,q2)o- qi 0ff, 2 , we have 

T(pi,p 2 ) = <( 



Px^P2 or qi / q 2 

1 Pi = P2 = qi = q2 = 
k d?rzi Pi=P2^0,qi=q 2 ^0 



(3.7) 



o p\i z P2 

Pl=P2 = 



(3.8) 



Therefore the terms o~ Pl (8> cr p2 with pi 7^ p 2 are set to zero. Further, the sum of the diagonal 
coefficients j(p,p) is conserved. This allows us to identify this with a probability distribution (after 
renormalising) and use Markov chain analysis. To see this, write again the starting state 



^2 io(qi,q2)o- qi (g> a q2 



91:92 



with state after application of any unitary W 



91,92 



91,92 



Then 



9 

tr (.FpwO 

~ E 7(?i,?2)tr (> (W^^t) ® (Wa q2 W^ 



91,92 



j X] 7(?i.92)tr (o- qi a q2 ) 



91,92 



J^7(9,9) 



as required, where T is the swap operator and we have used Lemmas IA.2I and IA.11 
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3.3 Moments for General Universal Random Circuits 

We now consider universal distributions \x that in general may be different from the uniform (Haar) 
measure on U (d) . Our main result in this section will be to show that a universal distribution on 
£7(4) is also 2-copy gapped. In fact, we will phrase this result in slightly more general terms and 
show that a universal distribution on U(d) is also fc-copy gapped for any k. Universality (Definition 
I2.8P generalises in the obvious way to U (d), whereas when we say that fi is A;-copy gapped, we mean 
that 

l|G> — Gu(d)\\<x < 1) (3-9) 

where G? = ~EuU® k ® (U*)® k , with the expectation taken over \i for or over the Haar measure 
for G u{d) . 

The reason Eqn. 13.91 represents our condition for // to be fe-copy gapped is as follows: Observe that 
G and G are unitarily related, so the definition of fc-copy gapped could equivalently be given in 
terms of G. We have shown above that Gjj^ (and thus Gjj^) has all eigenvalues equal to or 1; 
i.e. is a projector. By contrast, G M may not even be Hermitian. However, we will prove below that 
all eigenvectors of Gjj^ with eigenvalue 1 are also eigenvectors of G M with eigenvalue 1. Thus, 
Eqn. 13.91 will imply that lim^ 00 (G' At ) t = Gu(d)i J us t as we would expect for a gapped random walk. 

We would like to show that Eqn. 13.91 holds whenever fj, is universal. This result was proved in [6] 
(and was probably known even earlier) when [i had the form (5jj 1 + 5u 2 )/2. Here we show how to 
extend the argument to any universal //. 

Lemma 3.7. Let n be a distribution on U(d). Then all eigenvectors of Gu^) with eigenvalue 1 
are eigenvectors of G^ with eigenvalue one. Additionally, if fi is universal then fj, is k-copy gapped 
for any positive integer k (cf. Eqn. \3.9\) . 

In particular, if k = 2 this Lemma implies that [i is 2-copy gapped (cf. Theorem 12.91) . 

Proof. Let V = C d be the fundamental representation of U(d), where the action of U € U(d) 
is simply U itself. Let V* be its dual representation, where U acts as U* . The operators G^ 
and Guu\ act on the space V® k (8) (y*)® k . We will see that Gjj^) i s completely determined by 
the decomposition of V® k (g> (U*)® fc into irreducible representations (irreps). Suppose that the 
multiplicity of (r\,V\) in V® k (g> (y*)® fc is m\, where the Vx's are the irrep spaces and r\(U) the 
corresponding representation matrices. In other words 

^« 8 (rf^0y A ®c B> (3.io) 

A 
A 

Here ~ indicates that the two sides are related by conjugation by a fixed ([/-independent) unitary. 

Let A = denote the trivial irrep: i.e. Vq = C and tq{U) = 1 for all U. We claim that Kur\(U) = 
whenever A ^ and the expectation is taken over the Haar measure. To show this, note that 
^u r \(U) commutes with r\(V) for all V £ U(d) and thus, by Schur's Lemma, we must have 
~Ejjr\(U) = cl for some c £ C. However, by the translation-invariance of the Haar measure we have 
cl = E urx (U) = E urx (UV) = cr x {V) for all V 6 U(d). Since A + 0, we cannot have r\(V) = I 
for all V and so it must be that c = 0. 
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Thus, if we write Gy^) and G^ using the basis on the RHS of Eqn. 13.111 we have 



G. 



U(d) 



mo 



where |0)(0| is a projector onto the trivial irrep. On the other hand, 



G M = |0)(0|®/ mo + 



A^O 



|A)(A|<g> / r x (U)dfjL(U)) ®I, 



(3.12) 



(3.13) 



Thus, every eigenvector of GWrf) with eigenvalue one is also fixed by G^. For the remainder of the 
space, the direct sum structure means that 



\G 



U(d) 



G 



fjb \\oo 



max 

A^O 



r x (U)d»{U) 



(3.14) 



Note that this maximisation only includes A with dimV\ > 1- This is because non-trivial one- 
dimensional irreps of U(d) have the form det U m for some non-zero integer m. Under the map 
U^e^U, such irreps pick up a phase of e tm ^ . However, U® k ®(U*)® k is invariant under U i— > e^U. 
Thus V® k ® (V*)® k cannot contain any non-trivial one-dimensional irreps. 

Now suppose by contradiction that there exists A ^ with m x / and || J r x (U)dn(U)\\ OQ = 1. 
(We do not need to consider the case || J r\(U)d(j,(U)\\ oa > 1, since ||rx(Z7) ||oo = 1 for all U and 
|| • | |oo obeys the triangle inequality.) Indeed, the triangle inequality further implies that there exists 
a unit vector \v) G V x such that 



/ 



dfi(U)r x (U)\v) =lj\v) 



for some w 6 C with M = 1. 



By the above argument we can assume that dim Fx > 1- Since V\ is irreducible, it cannot contain 
a one-dimensional invariant subspace, implying that there exists Uq G U(d) such that 

\(v\r x (U )\v)\ = 1-5, 

for some 5 > 0. Since U >— * \ {v\r\(U)\v)\ is continuous, there exists an open ball S around Uq such 
that \{v\r x {U)\v)\ <l-6/2 for all U G S. Define 5 := U(d)\S. 

Now we use the fact that fi is universal to find an £ such that n*^{S) > 0. Next, observe that 
J d/jL* e (U) (v\r x (U)\v) = ur. Taking the absolute value of both sides yields 



/ 



d^ e (U) (v\r x (U)\v) 



< 



U(d) 



d^{U) \{v\r x (U)\v)\ 



d^(U) \{v\r x (U)\v)\+ _d^(U) \(v\r x (U)\v)\ 



S 



1 



< 1, 

a contradiction. We conclude that \\G 



5 i + u 



.*£ 



(S)) 



U(d) 



G>||oo < 1- 



□ 
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4 Convergence 

In Section [3] we saw that iterating any universal gate set on U(d) eventually converges to the 
uniform distribution on U(d). Since the set of all two-qubit unitaries is universal on U(2 n ), this 
implies that random circuits eventually converge to the Haar measure. In this section, we turn to 
proving upper bounds on this convergence rate, focusing on the first two moments. 

Let Gw) be the matrix with G (with d = 4) acting on qubits i and j and the identity on the others. 
Then, if the pair is chosen at step t, we can find the coefficients at step t + 1 by multiplying 
by G^\ In general, a random pair is chosen at each step. So 



where "ft+i are the expected coefficients at step t. We can think of this evolution as repeated 
application of the matrix 



For k = 2, the key idea of Oliveira et al. [26] was to map the evolution of the j(p,p) coefficients to 
a Markov chain. The 7(^1,^2) coefficients with p\ ^ P2 just decay as each qubit is chosen and can 
be analysed directly. 

However, we can only map the j(p,p) coefficients to a probability distribution when they are non- 
negative, which is not the case for general states. Most of the rest of the paper is dedicated to 
proving Lemma l2.11l which only applies to states with j(p,p) > and normalised so their sum is 
1. Corollary 12.121 then extends this to all states: 

of Corollary 1 2.1SX Lemma 12.111 still applies to the 7(^1,^2) terms with p\ ^ p2- Therefore we just 
need to show how to apply Lemma 12.111 to states that initially have some negative ^{p,p) terms. 

For the j(p,p) terms, Lemma 12.111 says that the random walk starting with any initial probability 
distribution converges to uniform in some bounded time t. Let gt(p,P',q,l) be the coefficients 
after t steps of the walk starting at a particular point q (i.e. go(p,P',q,q) = S Ptq ). Now, for any 
starting state p, let the initial coefficients be jo(p,p). Then, by linearity, we can write the expected 
coefficients after t steps ^t(p,p) '■= ^lw{p->P) as 




(4.1) 




(4.2) 




(4.3) 



for p 7^ 0. 

We can now prove convergence rates for the expected coefficients r jt{p,p)'- 



(i) For the 2-norm, we have from Lemma 12.111 that for t > Cnlogl/e 




(4.4) 
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for any q. Note that the normalisation for the 7(77, p) terms with p / has changed from 
Lemma 12.111 since we are neglecting the 7(0,0) term here. Now 

2 



p^O v 



4 n - 1 



= 5^70(9,9) lgt(p,p;q, 

p^O \q^0 v 

< Yl 7o(?> 9) 2 ^ ; ^ " 4^1) 

<(4"-l)6^ 7 o(g,9) 2 
9^0 

<4" C 



92 J 



= 4 n etr/ 
< 4 n e 

where the first inequality is the Cauchy-Schwarz inequality. Therefore for t > Cn(n+log4 n /e), 
the 2-norm distance from stationarity for the j(p,p) terms is at most e. Choose C such that 
C'n(n + log 1/e) > Cn(n + log4 n /e) to obtain the result. 



(ii) For the 1-norm, Lemma |2.1 II savs that for t > Cn(n + log 1/e) 

1 



E 

p^O 



9t(q;p,p) 



A n - 1 



< e. 



(4.5) 



We can then proceed much as for the 2-norm case: 



E 

p^O 



lt{p,P) 



4™ - 1 



E 

p^O 



^70(9,9) ( gt(p,p;q,< 



1 



4™ - 1 
1 



4™ - 1 



< ^2 ho (9,9)1 X] 9t(p,p;q,q) 

q^O p^O 

< 170(9,9)1 

97^0 

< 2 n e. 

The last inequality follows from \a q <8> <r 9 | = o"o <8> o"o- Therefore for i > Cn(n + log2 ?l /e), the 
1-norm distance from stationarity for the ^{p,p) terms is at most e. □ 

We now proceed to prove Lemma 12.111 Firstly, we will consider the simple case of k = 1 to prove 
this process forms a 1-design as this will help us to understand the more complicated case of k = 2. 
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4.1 First Moments Convergence 



Recall that p = 2" n / 2 £ p 7(p)<7 p and 

we wish to evaluate the moments of the coefficients. So for 
the first moments to converge, we want to know K'yfj)). 

For k = 1, the U(4) random circuit uniformly randomises each pair that is chosen. More precisely, 
a pair of sites i,j are chosen at random and all the coefficients with pi ^ or pj ^ are set to 
zero. Thus we get an exact 1-design when all sites have been hit. For other gate sets, the terms 
do not decay to zero but decay by a factor depending on the gap of G. Call the gap A; for ?7(4) 
A = 1 and for others < A < 1 and A is independent of n. Therefore once each site has been hit 
m times the terms have decayed by a factor (1 — A) m . 

For a bound like the mixing time (see Section [4.31 for definition), we want to bound the quantity 
Sp^o I^w7w(p)| where 7vy(p) is the Pauli coefficient after applying the random circuit W. We 
also want 2-norm bounds, so we bound Ylp^o^wlwip)) 2 too. We will in fact find bounds on 
J2 P ^o E whw{p)\ and Ylp^o^whwipW, which are stronger. 

A standard problem in the theory of randomised algorithms is the 'coupon collector' problem. If 
a magazine comes with a free coupon, which is chosen uniformly randomly from n different types, 
how many magazines should you buy to have a high probability of getting all n coupons? It is not 
hard to show that n In j samples (magazines) have at least a 1 — e probability of including all n 
coupons. Using this, we expect all sites to be hit with probability at least 1 — e after 0(nlog^) 
steps. This argument can be made precise in this context by bounding the non-identity coefficients. 
We find, as expected, that the sum is small after 0(n log n) steps: 

Lemma 4.1. After 0[n log 1/e) steps 

Y,(^w\lw(p)\) 2 <e 
p^O 

and after 0(n log -) steps, 

J2^w\lw(p)\ <e. (4.6) 

p^O 

Proof. At each step, a pair of sites is chosen at random and any terms with non-identity coefficients 
for this pair decay by a factor (1 — A). For example, the term o\ ® a^ n ^ decays whenever the 
first site is chosen. Thus the probability of each term decaying depends on the number of zeroes. 
We start with the 1-norm bound. 

Suppose the circuit applied after t steps is W t - Consider Ew t |7w t (p)| f° r anv P with d non-zeroes. 
Since the state p is physical, tr p 2 < 1 so Y^ p 7o(p) — 1- Now, in each step, if any site is chosen where 
p is non-zero, this term decays by a factor (1 — A). This occurs with probability 1 — ^~^7^~jj ^ > 
d/n, the probability of choosing a pair where at least one site is non-zero. Therefore 

l|7Wt(p)l < ((1 " A )d/n + (1 - d/n)) \ 1Wt -Ap)\ 
where the expectation is over the circuit applied at step t. If we iterate this t times we find 

^w\lw{p)\ < exp(-Aid/n)|7o(p)| 
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where the expectation here is over all random circuits for the t steps. We now sum over all p: 

n 

Y;^whw(p)\ < £)exp(-Aid/n) ho(p)\ 

p^O d=l d{p)=d 

where d(p) is the number of non-zeroes inp. For the 1-norm bound, we can simply bound |7o(p)| < 1 
togiveE,i( P )=j7o(p)|<G)3 d so 

J2E W \~/ W (p)\ < (l + 3exp(-AVn)) n -l 

p^O 

where we have used the binomial theorem. Now let t = ^ In — . This gives 

J2®w\lw(p)\ < (l + e/n) n -l = 0(e). 

For the 2-norm bound, 

Y,(^w\iw{p)\f < ^exp(-2Atd/n) 7o 2 (p) 

p^O p^O 
n 

= ^exp(-2Atd/n) ^ 7o 2 (p) 

d=l d(p)=d 
n 

<^exp(-2Atd/n) 

d=l 

exp(-2At/n) 
~1 -exp(-2Ai/ra) 

where we have used ^2 p 7 2 (p) < 1. We find after ^ In 1/e steps that 

^(E w \ lw (p)\) 2 <-^— □ 

4.2 Second Moments Convergence 

Firstly, the a Pl <t P2 terms for p\ ^ p2 decay in a similar way to the non-identity terms in the 
1-design analysis. In fact, the proof of Lemma 14.11 carries over almost identically to this case to 
give 

Lemma 4.2. After 0(n log 1/e) steps 

^ (E w \-/ W (p 1 ,p 2 )\) 2 < e 

Pl¥=P2 

and after 0(n(n + log 1/e)) steps 

^2 E w \j w (p 1 ,p 2 )\ < e. 

P1^P2 
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Proof. Instead of the number of zeroes governing the decay rate, we need to count the number of 
places where p\ and p 2 differ. This gives 

E\ 1Wt ( Pl ,p 2 )\ < ((l-A)d/n + (l-d/n))| 7 w t _ x (pi,P2)| 

where now d is the number of differing sites. There are I2 d 4 n_d states that differ in d places so 
we find 

£ ®w\-Yw(pi,P2)\ <4"[(l + 3exp(-At/n)) n -l]. 

Set t = ^(nln4 + In 1/e) to make this O(e). The 2-norm bound follows in the same way as for 
Lemma 14.11 □ 

We now need to prove the 7(p,p) terms converge quickly. We have seen above that the sum of the 
terms j(p,p) is conserved and, for the purposes of proving Lemma 12.111 we assume the sum is 1 
and j(p,p) > for all p. 

To illustrate the evolution, consider the simplest case when the gates are chosen from C/(4). We 
have evaluated G in Section [3.21 for k = 2 for this case. Translated into coefficients this yields the 
following update rule, where we have written it for the case when qubits 1 and 2 are chosen: 

7t+i(ri,r 2 ,r 3 , . . . , r n , si, s 2 , s 3 , . . .,s n ) 

(ri,r 2 ) ^ (si,s 2 ) 

7t(0,0,r 3 ,...,r n ,0,0,s 3 ,...,s n ) (n, r 2 ) = (si, s 2 ) = (0, 0) 

T5Y. r[,r! 2 7tW,^2> r 3> • • • ,r n ,r'i,r 2 ,s 3 , . . . ,s n ) (n,r 2 ) = (s 1 ,s 2 ) + (0,0). 

The key idea of Oliveira et al. [26] was to map the evolution of the j{p,p) coefficients to a Markov 
chain. We can apply this here to get, on state space {0, 1, 2, 3} n , the evolution: 

1. Choose a pair of sites uniformly at random. 

2. If the state is 00 it remains 00. 

3. Otherwise, choose the state uniformly at random from {0, 1, 2, 3} 2 \{00}. 

This is the correct evolution since, if the initial state is distributed according to 7t(g, q), the final 
state is distributed according to 7t+i(p,p). 

The evolution for other gate sets will be similar, but the states will not be chosen uniformly 
randomly in the third step. However, the state 00 will remain 00 and the stationary distribution 
on the other 15 states is the same. We will find the convergence times for general gate sets and 
then consider the U (4) gate set since we can perform a tight analysis for this case. 



4.3 Markov Chain Analysis 

Before finding the convergence rate for our problem, we will briefly introduce the basics of Markov 
chain mixing time analysis. All of these standard results can be found in Ref. [25] and references 
therein. 
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A process is Markov if the evolution only depends on the current state rather than the full state 
history. Therefore the evolution of the state can be thought of as a matrix, the transition matrix, 
acting on a vector which represents the current distribution. We will only be interested in discrete 
time processes so the state after t steps is given by the t th power of the transition matrix acting on 
the initial distribution. 

We say a Markov chain is irreducible if it is possible to get from one state to any other state in 
some number of steps. Further, a chain is aperiodic if it does not return to a state at regular 
intervals. If a chain is both irreducible and aperiodic then it is said to be ergodic. A well known 
result of Markov chain theory is that all ergodic chains converge to a unique stationary distribution. 
In matrix language this says that the transition matrix P has eigenvalue 1 with no multiplicity 
and all other eigenvalues have absolute value strictly less than 1. We will also need the notion of 
reversibility. A Markov chain is reversible if the time reversed chain has the same transition matrix, 
with respect to some distribution. This condition is also known as detailed balance: 

7r(x)P(x,y)=n(y)P(y,x). (4.8) 

It can be shown that a reversible ergodic Markov chain is only reversible with respect to the sta- 
tionary distribution. So above tt(x) is the stationary distribution of P. An immediate consequence 
of this is that for a chain with uniform stationary distribution, it is reversible if and only if it is 
symmetric (i.e. P(x,y) = P(y,x)). Note also that reversible chains have real eigenvalues, since 

they are similar to the symmetric matrix ^yP(x, y). 

With these definitions and concepts, we can now ask how quickly the Markov chain converges to 
the stationary distribution. This is normally defined in terms of the 1-norm mixing time. We use 
(half the) 1-norm distance to measure distances between distributions: 

i 

We assume all distributions are normalised so then 0<||s — i||<l. We can now define the mixing 
time: 

Definition 4.3. Let n be the stationary distribution of P. Then if P is ergodic the mixing time r 
is 

He) = maxmin{t > : ||P*s - vrll < e}. (4.10) 

s t 11 11 

We will also use the (weaker) 2-norm mixing time (note this is not the same as r 2 in Ref. |25j): 

Definition 4.4. Let tt be the stationary distribution of P. Then if P is ergodic the 2-norm mixing 
time T2 is 

r 2 (e) = maxmin{i > : ||P*a - ttIL < e}. (4.11) 

Unless otherwise stated, when we say mixing time we are referring to the 1-norm mixing time. 

There are many techniques for bounding the mixing time, including finding the second largest 
eigenvalue of P. This gives a good measure of the mixing time because components parallel to the 
second largest eigenvector decay the slowest. We have (for reversible ergodic chains) 
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Theorem 4.5 (see Ref. [25], Corollary 1.15). 

1, 1 

r e < -r-ln 

A vr*e 

where 7r* = min7r(x) and A = min(l — A2, 1 + \ m in) where A2 is the second largest eigenvalue and 
^min is the smallest. A is known as the gap. 

If the chain is irreversible, it may not even have real eigenvalues. However, we can bound the 
mixing time in terms of the eigenvalues of the reversible matrix PP* where P*(x, y) = ^|yP(y, x). 
In this case we have ([25J, Corollary 1.14) 

r(e)<-^-ln — (4.12) 

L\pp* 7T*e 

where now App* is the gap of the chain PP*. Note that for a reversible chain P = P* and 
App* ~ 2 A so the bounds are approximately the same. 

This can also be converted into a 2-norm mixing time bound: 

r 2 (e)<— ?-lnl/e. (4.13) 

To bound the gap, we will use the comparison theorem in Theorem 14.61 below. In this Theorem, we 
are thinking of the Markov chain as a directed graph where the vertices are the states and there 
are edges for allowed transitions (i.e. transitions with non-zero probability). For irreducible chains, 
it is possible to make a path from any vertex to any other; we call the path length the number of 
transitions in such a path (which will in general depend on the choice of path). 

Theorem 4.6 (see Ref. [25], Theorem 2.14). Let P and P be two Markov chains on the same state 
space ft with the same stationary distribution n. Then, for every x 7^ y £ ft with P(x,y) > define 
a directed path ^j xy from x to y along edges in P and let its length be {"fxyl- Let V be the set of all 
such paths. Then 

A > A/A 

for the gaps A and A where 

A = A(T) = max — , , — — — tt(x)P(x, y)\ r y X v\- 

y ' ajtb,p(a,b)^o ir(a)P{a,b) ^ \ ) \ mnmx 



x^y:(a,b)£"t. 



xy 



For example, when comparing 1-dimensional random walks there is no choice in the paths; they 
must pass through every point between x and y. Further, the walk can only progress one step at a 
time so (without loss of generality, for reversible chains) let b = a + 1 to give 

A = max 1 — V V ir(x)P(x,y)(y - x) 

a 7r(a)P(a,a + l) ^-f 

x<ay>a+l 

P{a,a + 1) 

= max — H r- (4.14) 

a P(a,a + 1) v ; 

A generalisation of the comparison theorem involves constructing flows, which are weighted sets 
of paths between states. This can give a tighter bound since bottlenecks are averaged over. This 
gives a modified comparison theorem: 
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Theorem 4.7 ([12], Theorem 2.3). Let P and P be two Markov chains on the same state space Q 
with the same stationary distribution ir. Then, for every i/i/6fl with P(x,y) > 0, construct a 
set of directed paths V xy from x to y along edges in P. We define the flow function f which maps 
each path j xy S V xy to a real number in the interval [0, 1] such that 

'Jx y £ T~*x y 

Again, let the length of each path be {"fxyl- Then 

A > A/A 

for the gaps A and A where 

A = A{f)= max J, V Tr(x)f(j xy )\-f xy \. (4.15) 

a^b,P (a,b ^0 7T(a )P a, b) J-« 

x^y,-yxyGVxy:(a,b)^ xy 

Note that we recover the comparison theorem when there is just one path between each x and y. 



4.3.1 log-Sobolev Constant 

We will need tighter, but more complicated, mixing time results to prove the tight result for the 
?7(4) case. We use the log-Sobolev constant: 

Definition 4.8. The log-Sobolev constant p of a chain with transition matrix P and stationary 
distribution ir is 

. E^ y (f(x)-f(y)) 2 n*,y)<y) 

9 = mm — 7T12 • 

/ E,-W/(x)^log r -ig 7 ^ 

The mixing time result is: 

Lemma 4.9 (see Ref. [13], Theorem 3.7'). The mixing time of a finite, reversible, irreducible 
Markov chain is 

r(e) = log log — + ^log-) (4.16) 

where p is the Sobolev constant, 7T* is the smallest value of the stationary distribution, A is the gap 
and d is the size of the state space. 

Further, the comparison theorem (Theorem I4.6j) works just the same to give 

p > p/A. 

We will need one more result, due to Diaconis and Saloff-Coste: 

Lemma 4.10 ([13], Lemma 3.2). Let Pi, i = 1, . . . , d, be Markov chains with gaps Aj and Sobolev 
constants pi. Now construct the product chain P. This chain has state space equal to the product 
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of the spaces for the chains Pi and at each step one of the chains is chosen at random and run for 
one step. Then P has spectral gap given by: 



and Sobolev constant: 



1 A 

A = — mm Aj 

d i 



p = - mm/),;. 
a i 



4.4 Convergence Proof 

We now prove the Markov chain convergence results to show that the "f(p,p) terms converge quickly. 
We have already shown that the 7(^1,^2) terms with p\ 7^ P2 converge quickly and that there is 
no mixing between these terms and the ^f{p,p) terms. Therefore, in this section, we remove such 
terms from G. 

We want to prove the Markov chain with transition matrix (Eqn. 14. 2p 

n(n — 1) ^— ' 

converges quickly. Firstly, we know from Section 13.31 that P has two eigenvectors with eigenvalue 
1. The first is the identity state (<jo <8> 00) and the second is the uniform sum of all non-identity 
terms ( 4n 1 _ 1 X^o°p ® a p)- From now on, we remove the identity state. This makes the chain 
irreducible. Since we know it converges, it must be aperiodic also so the chain is ergodic and all 
other eigenvalues are strictly between 1 and —1. 

We show here that the gap of this chain, up to constants, does not depend on the choice of 2-copy 
gapped gate set. In the second half of the paper we find a tight bound on the gap for the £7(4) case 
which consequently gives a tight bound on the gap for all universal sets. 

Since the stationary distribution is uniform, the chain is reversible if and only if P is a symmetric 
matrix. A sufficient condition for P to be symmetric is for G^' to be symmetric. We saw in 
Theorem 13.31 that for the U(4) gate set case G™ is symmetric. In fact, the proof works identically 
to show that G^' is symmetric for any gate set, provided the set is invariant under Hermitian 
conjugation. However, 2-copy gapped gate sets do not necessarily have this property so the Markov 
chain is not necessarily reversible. We will find equal bounds (up to constants) for the gaps of both 
P (if G is symmetric) and PP* (if G is not symmetric) below: 

Theorem 4.11. Let fi be any 2-copy gapped distribution of gates. If [i is invariant under Hermitian 
conjugation then let Ap be the eigenvalue gap of the resulting Markov chain matrix P. Then 

A P = n(A u{4) ) (4.17) 

where Ayr^ is the eigenvalue gap of the U{4) chain. If [i is not invariant under Hermitian conju- 
gation then let App* be the eigenvalue gap of the resulting Markov chain matrix PP*. Then 

A PP * = n(A m ). (4.18) 
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Proof. We will use the comparison method with flows (Theorem I4.7p . Firstly consider the case 
where [i is closed under Hermitian conjugation i.e. G is symmetric. 

We will compare P to the U{A) chain, which we call Puuy Recall that this chain chooses a pair 
at random and does nothing if the pair is 00 and chooses a random state from {0, 1, 2, 3} 2 \{00} 
otherwise. 

To apply Theorem 14.71 we need to construct the flows between transitions in Pjjuy We will choose 
paths such that only one pair is modified throughout. For example (with n = 4), the transition 
1000 — * 2000 is allowed in Pu(4)- To construct a path in P, we need to find allowed transitions 
between these two paths in P. G may not include the transition 10 — ► 20 directly, however, G 
is irreducible on this subspace of just two pairs. This means that a path exists and can be of 
maximum length 14 if it has to cycle through all intermediate states (in fact, since G is symmetric 
the maximum path length is 8; all that is important here is that it is constant). For example, 
the transitions 10 —* 11 — > 20 might be allowed. Then we could choose the full path to be 
1000 — > 1100 — > 2000. In this case we have chosen the path to involve transitions pairing sites 1 
and 2. However, we could equally well have chosen any pairing; we could pair the first site with 
any of the others. We can choose 3 paths in this way. For this example, the flow we want to choose 
will be all 3 of these paths equally weighted. We now use this idea to construct flows between all 
transitions in Puu) to prove the result. 

Let x / y 6 ft and let d(x,y) be the Hamming distance between the states (d(x,y) gives the 
number of places at which x and y differ). There are two cases where Pjj^{x,y) 7^ 0: 

1. d(x,y) = 2. Here we must choose a unique pairing, specified by the two sites that differ. 
Make all transitions in P using this pair giving just one path. 

2. d{x,y) = 1. For this case, choose all possible pairings of the changing site that give allowed 
transitions in P\jU) • For each pairing, construct a path in P modifying only this pair. If the 
differing site is initially non-zero then there are n — 1 such pairings; if the differing site is 
initially zero then there are n — z(x) pairings where z{x) is the number of zeroes in the state 
x. 

All the above paths are of constant length since we have to (at most) cycle through all states of a 
pair. We must now choose the weighting f{^ xy ) for each path such that 

^2f(7sy) = Pu(4)(.x,y) (4.19) 

Vxy 

where V X y is the set of all paths from x to y constructed above. We choose the weighting of each 
path to be uniform. We just need to calculate the number of paths in V xy to find /: 

1. d(x,y) = 2. There is just one path so f{^ X y) = Pu{A)( x ^y) = ©(l/" 2 )- 

2. d(x,y) = 1. If the differing site is initially non-zero then Puu) { x i y) = 9(l/?i) and there 
are n — 1 paths so /(jxy) = ^^-1 = ©(I/ 71 - 2 )- If the differing site is initially zero then 
Pu(4)(x, y) = © {—^) and there are n - z(x) paths so f(j xy ) = ^-Ifo)^ = ©(V" 2 )- 
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So for all paths, / = 0(l/n 2 ). We now just need to know how many times each edge (a, b) in P is 
used to calculate A: 

A = max A(a, b) (4.20) 

a^b,P(a,b)^0 

where 

x ¥=y>1xy &V xy : (a,b)ej X y 

We have cancelled the factors of tt(x) because the stationary distribution is uniform. We have also 
ignored the lengths of the paths since they are all constant. 

To evaluate A(a,b), we need to know how many paths pass through each edge (a, b). We again 
consider the two possibilities separately: 

1. d(a,b) = 2. Suppose a and b differ at sites i and j. Firstly, we need to count how many 
transitions from x to y in Pj/U) could use this edge, and then how many paths for each 
transition actually use the edge. 

To find which x and y could use the edge, note that x and y must differ at sites i, j or 
both. Furthermore, the values at the sites other than % and j must be the same as for a 
(and therefore b). There is a constant number of x,y pairs that satisfy this condition. Now, 
for each x, y pair satisfying this, paths that use this edge must use the pairing i,j for all 
transitions. Since in the paths we have chosen above there is a unique path from x to y for 
each pairing, there is at most one path for each x, y pair that uses edge a, b. 

For d(a, b) = 2, P(a, b) = 0(l/n 2 ) so A(a, b) is a constant for this case. 

2. d{a,b) = 1. Let there be r pairings that give allowed transitions in P between a and b. As 
above, each pairing gives a constant number of paths. So the numerator is 0(r/n 2 ). Further, 
P(a,b) = 0(r/n 2 ). So again A(a,b) is constant. 

Combining, A is a constant so the result is proven for the case G is symmetric. 

We now turn to the irreversible case. We now need to bound the gap of PP* = PP T . This chain 
selects two (possibly overlapping) pairs at random and applies G to one of them and G T to the 
other. We can use the above exactly by choosing G to perform the transitions above and G T to just 
loop the states back to themselves. By aperiodicity (the greatest common divisor of loop lengths 
is 1), we can always find constant length paths that do this. □ 

Now we need to know the gap of the U(A) chain. We can, by a simple application of the comparison 
theorem, show it is Q(l/n 2 ). However, in the second half of this paper we show it is 0(l/n). This 
gives us (using Theorem I4.5P : 

Corollary 4.12. The Markov chain P has mixing time 0(n(n + log 1/e)) and 2-norm mixing time 
0{n log 1/e). 

We conjecture that the mixing time (as well as Lemma |4.2[) can be tightened to 0(nlog-), which 
is asymptotically the same as for the U{4) case: 

Conjecture 4.13. The second moments for the case of general 2-copy gapped distributions have 
1-norm mixing time 0(nlog-). 
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It seems likely that an extension of our techniques in Section [5] could be used to prove this. 
Combining the convergence results we have proved our general result Lemma l2,lll 



of Lemma \2.1l\ Combining Corollary 14. 121 (for the 7(2?, p) terms) and Lemma f4.2l (for the 7(^1,^2), 
Pi 7^ P2 terms) proves the result. □ 

We have now shown that the first and second moments of random circuits converge quickly. For 
the remainder of the paper we prove the tight bound for the gap and mixing time of the U(4) case 
and show how mixing time bounds relate to the closeness of the 2-design to an exact design. Only 
for the £7(4) case is the matrix G a projector so in this sense the t/(4) random circuit is the most 
fundamental. While we expect the above mixing time bound is not tight, we can prove a tight 
mixing time result for the £7(4) case. However, using our definition of an approximate /c-design, 
the gap rather than the mixing time governs the degree of approximation. 



5 Tight Analysis for the £7(4) Case 

We have already found tight bounds for the first moments in Lemma 14.11 just set A = 1. 



5.1 Second Moments Convergence 

We need to prove a result analogous to Lemma 14.21 for the terms a Pl <g) a P2 where p\ ^ P2- We 
already have a tight bound for the 2-norm decay, by setting A = 1 into Lemma 14.21 We tighten 
the 1-norm bound: 

Lemma 5.1. After 0(n log steps 

^2 E whw{pi,P2)\ < e (5.1) 

Proof. We will split the random circuits up into classes depending on how many qubits have been 
hit. Let H be the random variable giving the number of different qubits that have been hit. We 
can work out the distribution of H and bound the sum of |7w(Pl>P2)| f° r each outcome. 

Firstly we have, after t steps, 



Now, for each qubit hit, each coefficient which has pi and P2 differing in this place is set to zero. 
So after h have been hit, there are only (at most) lQ^ n ~ h ^ terms in the sum in Eqn. 15.11 As before, 
the state is a physical state, trp 2 < 1 so ^2 PlP2 J 2 (pi,P2) < 1 so ^2 PlP2 \j(pi,P2)\ < V^/V if there 
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are at most iV non-zero terms in the sum. Therefore we have, after t steps, 

71-1 

®W\JW(PUP2)\ < E P (^ = h)16 {n ~ h)/2 
n-1 

h=l 
n-1 

h=l 
n-1 

h=l 
n-1 

h=l 

Now, let t = nln^: 




where the last line follows from the binomial theorem. □ 

This, combined with the mixing time result we prove below, completes the proof that the second 
moments of the random circuit converge in time 0(n log j). 

5.2 Markov Chain of Coefficients 

The Markov chain acting on the coefficients is reducible because the state {0} n is isolated. However, 
if we remove it then the chain becomes irreducible. The presence of self loops implies aperiodicity 
therefore the chain is ergodic. We have already seen that the chain converges to the Haar uniform 
distribution (in Section [1.1 p therefore the stationary state is the uniform state tt{x) = l/(4 ra — 1). 
Further, since the chain is symmetric and has uniform stationary distribution, the chain satisfies 
detailed balance (Eqn. I4.8P so is reversible. We now turn to obtaining bounds on the mixing time 
of this chain. 

We want to show that the full chain converges to stationarity in time 0(nlog -). This implies (see 
later) that the gap is 0(l/n). To prove this, we will construct another chain called the zero chain. 
This is the chain that counts the number of zeroes in the state. Since it is the zeroes that slow 
down the mixing, this chain will accurately describe the mixing time of the full chain. 



n 



(1 - h/nfA h h^n-h 



n 



exp(-ht/n)4: h . 
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Lemma 5.2. The zero chain has transition matrix P on state space (we count non-zero positions) 
U = {l,2,...,n}. 



P(x,y) 



5n(n—l) y 

2x(x-l) -. 

— ) -{ y = x — 1 

5n(n— 1 y 



6x(n— a; 
5n(n— 1) 

otherwise 



(5.2) 

y = x + 1 



/or 1 < x, y < n. 

Proof. Suppose there are n — x zeroes (so there are x non-zeroes). Then the only way the number 
of zeroes can decrease (i.e. for x to increase) is if a non-zero item is paired with a zero item and 
one of the 9 (out of 15) new states is chosen with no zeroes. The probability of choosing such a 
pair is ^fezw so the overall probability is jg t^zjt ■ 

The number of zeroes can increase only if a pair of non-zero items is chosen and one of the 6 states 
is chosen with one zero. The probability of this occurring is ^ n(n-i) • 

The probability of the number of zeroes remaining unchanged is simply calculated by requiring the 
probabilities to sum to 1. □ 

We see that the zero chain is a one-dimensional random walk on the line. It is a lazy random walk 
because the probability of moving at each step is < 1. However, as the number of zeroes decreases, 
the probability of moving increases monotonically: 

„. , 2x(3n — 2x — 1) 

1 - P(x, x) = — — ; r — - > 2x 5n < 1. 5.3) 

5n(n — 1) 

Lemma 5.3. The stationary distribution of the zero chain is 



Proof. This can be proven by multiplying the transition matrix in Lemma [5. 2 1 by the state Eqn. 15.41 
Alternatively, it can be proven by counting the number of states with n — x zeroes. There are ( n ) 
ways of choosing which sites to make non-zero and each non-zero site can be one of three possibilities: 
1, 2 or 3. The total number of states is 4™ — 1, which gives the result. □ 



Below we will prove the following theorem: 

Theorem 5.4. The zero chain mixes in time 0(nlog-). 

The 2-norm mixing time follows easily: 

Theorem 5.5. The zero chain has 2-norm mixing time 0(n log 1/e). 



Proof. We use a lower bound on the 1-norm mixing time to show that the gap of the zero chain 
is f2(l/n) and then use the 2-norm mixing bound Eqn. 1-4. 131 In [25], Theorem 4.9, they prove the 
lower bound: 

1 - A 1 , ^ 

n(e)>^-ln- (5.5) 
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where A is the eigenvalue gap. In Theorem 15.41 we showed T\(e) < Cn In j for some constant C. 
Combining, 

1-A In , . 

— - — In — < Cn In - 5.6) 
A 2e ~ e y J 

for all e > 0. Divide by In 1/e and take the limit e — ► to find 

1-A 



• ' C" (5-7) 

which implies the gap is S7(l/n). The 2-norm bound now follows from Eqn. 14.131 □ 

Before proving Theorem 15.41 we will show how the mixing time of the full chain follows from this. 
Corollary 5.6. The full chain mixes in time 0(nlog^). 

Proof. Once the zero chain has approximately mixed, the distribution of zeroes is almost correct. 
We need to prove that the distribution of non-zeroes is correct after 0(n log j) steps too. 

Once each site of the full chain has been hit, meaning it is chosen and paired with another site 
so not both equal zero, the chain has mixed. This is because, after each site has been hit, the 
probability distribution over the states is uniform. When the zero chain has approximately mixed, 
a constant fraction of sites are zero so the probability of hitting a site at each step is ©(1/n). By 
the coupon collector argument, each site will have been hit with probability at least 1 — e in time 
time 0(n log j). Once the zero chain has mixed to e', we can run the full chain this extra number 
of steps to ensure each site has been hit with high probability. Since the mixing of the zero chain 
only increases with time, the distance to stationarity of the full chain is now 1 — e — e'. We make 
this formal below. 

After to = O(nlog^j) steps, the number of zeroes is e'-close to the stationary distribution 7To 
by Theorem 15.41 and only gets closer with more steps since the distance to stationarity decreases 
monotonically. The stationary distribution Eqn. [53] is approximately a Gaussian peaked at 3n/4 
with 0(n) variance. This means that, with high probability, the number of non-zeroes is close 
to 3n/4. We will in fact only need that there is at least a constant fraction of non-zeroes; with 
probability at least 1 — e' — exp(— fi(n)) there will be at least n/2. 

To prove the mixing time, we run the chain for time to so the zero chain mixes to e'. Then run 
for t\ additional steps. Let H^t be the event that site i is hit at step t. Let Hi = U^t^i-H^t and 
H = nf—^Hi. We want to show ¥(H) is close to 1, or, in other words, that all sites are hit with 
high probability. Further let Xt be the random variable giving the number of non-zeroes at step t. 

If at step t — 1 site i is non-zero then the event H{± occurs if the qubit is chosen, which occurs with 
probability 2/n. If, however, it was zero then it must be paired with a non-zero thing for H{± to 
hold. Conditioned on any history with Xt_\ > n/2, this probability is > 1/n. In particular, we 
can condition on not having previously hit i and the bound does not change. Combining we have 




[X t _!>n/2]f| p| HfA <l-l/n 
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Then, after t\ extra steps, 



/ to+ti-l \ 

Hf n ^ ^ n / 2 ] H (1 " i/n) 

V t=tQ / 



which, using the union bound, gives 



/ to+ti-l \ 

tf c p| [X t > n/2] < n(l - 1/n) 
V t=*o / 



Now, since the zero chain has mixed to e', 

/to+ti-l \ n-l 

P | p| [X t > n/2] < tj 7r (a;) + e' < t x exp(-0(n)) + e' 

/ x=n/2 



t=t 



SO 



P(# c ) < n(l - l/n)* 1 + tx exp(-0(n)) + e'. 

Now, choose t\ = nln^- so that ¥(H C ) < 5 where 5 = e + t\ exp(— 0{n)). Choose e = 1/n so that 5 
is l/poly(n). Now, using the bound on ¥(H C ), we can write the state v after t\ = 0{n log n) steps 
as 

v = (1 - 6)n + 5tt' 

where tt is the stationary distribution and ir' is any other distribution. Using this, 

\\v — vr 1 1 < 5. 

We now apply Lemma [A. 151 to show that after 0(n log j) steps the distance to stationarity of the 
full chain is e. □ 



5.3 Proof of Theorem 5.4 

We will now proceed to prove Theorem 15.41 We present an outline of the proof here; the details 
are m Section [OJ 

Firstly, note that by the coupon collector argument, the lower bound on the time is fJ(nlogn). 
We need to prove an upper bound equal to this. Intuition says that the mixing time should take 
time 0(n log n) because the walk has to move a distance 0(n) and the waiting time at each step is 
proportional to n, n/2, n/3, . . . which sums to 0(n log n), provided each site is not hit too often. We 
will show that this intuition is correct using Chernoff bound and log-Sobolev (see later) arguments. 

We will first work out concentration results of the position after some number of accelerated steps. 
The zero chain has some probability of staying still at each step. The accelerated chain is the zero 
chain conditioned on moving at each step. We define the accelerated chain by its transition matrix: 

Definition 5.7. The transition matrix for the accelerated chain is 



y = x 



x-l 



3n-2x-l 
3(n— x) 



3n-2x-l 

otherwise 



y = x — 1 

(5. 

y = X + 1 
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We use the accelerated chain in the proof to firstly prove the accelerated chain mixes quickly, then 
to bound the waiting time at each step to obtain a mixing time bound for the zero chain. 

To prove the mixing time bound, we will split the walk up into three phases. We will split the 
state space into three (slightly overlapping) parts and the phase can begin at any point within that 
space. So each phase has a state space Oj C [1, n], an entry space Ei C fij and an exit condition Tj. 
We say that a phase completes successfully if the exit condition is satisfied in time 0(n log n) for 
an initial state within the entry space. When the exit condition is satisfied, the walk moves onto 
the next phase. 

The phases are: 

1. f^i = [l,re 5 ] for some constant 5 with < 5 < 1/2. E\ = S7i (i.e. it can start anywhere) and 
T\ is satisfied when the walk reaches re* 5 . For this part, the probability of moving backwards 
(gaining zeroes) is C^n 5-1 ) so the walk progresses forwards at each step with high probability. 
This is proven in Lemma lA. 81 We show that the waiting time is O(nlogn) in Lemma IA.9I 

2. Q 2 = [n s /2, On] for some constant with < < 3/4. E% = [re 5 , On] and T 2 is satisfied when 
the walk reaches On. Here the walk can move both ways with constant probability but there 
is a fi(l) forward bias. Here we use a monotonicity argument: the probability of moving 
forward at each step is 



p(x) 



> 



> 



3(re — x) 
3n — 2x — 1 
3(re — x) 
3re — 2x 

3(1-0) 
3-26* ' 



If we model this random walk as a walk with constant bias equal to 3 $_<^ we will find an 
upper bound on the mixing time since mixing time increases monotonically with decreasing 
bias. Further, the waiting time at x = a stochastically dominates the waiting time at x = b 
for b > a. The true bias decreases with position so the walk with constant bias spends more 
time at the early steps. Thus the position of this simplified walk is stochastically dominated 
by the position of the real walk while the waiting time stochastically dominates the waiting 
time of the real walk. 

3. f^3 = [|re, n] and -E3 = [On, n]. T3 is satisfied when this restricted part of the chain has mixed 
to distance e. Here the bias decreases to zero as the walk approaches 3n/4 but the moving 
probability is a constant. We show that this walk mixes quickly by bounding the log-Sobolev 
constant of the chain. 

Showing these three phases complete successfully will give a mixing time bound for the whole chain. 

We now prove in the Appendix that the phases complete successfully with probability at least 
1-1/ poly(n): 

Lemma 5.8. 

P(Phase 1 completes successfully) > 1 — n 2<5_1 — 2n~ 5 
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Lemma 5.9. 



2 \ / 4 \ h. 2 ex P 



P(Phase 2 completes successfully) > 1 — exp ( —-[iQn I — I — J — ■; -— — (q/p) 

O I \ U ft J X GXp I LL I Zi) 



-fin s 

1 /' <n s /2 



where p = 6 ij_ 2 ^ — 1. 



3-26 

Lemma 5.10. 



dn/2 

P(Phase 3 completes successfully) > 1 



We can now finally combine to prove our result: 



3(2 



of Theorem \5.4\ The stationary distribution has exponentially small weight in the tail with lots 
of zeroes. We show that, provided the number of zeroes is within phase 3, the walk mixes in 
time 0(n log ~). We also show that if the number of zeroes is initially within phase 1 or 2, after 
0{n log re) steps the walk is in phase 3 with high probability. We can work out the distance to the 
stationary distribution as follows. 



Let pf be the probability of failure. This is the sum of the error probabilities in Lemmas l5.81l5.9l and 
15.101 The key point is that pf = 1/ poly(re). Then after 0(relog steps (the sum of the number 
of steps in the 3 phases), the state is equal to (1 — Pf)v 3 +pjv' where i>3 is the state in the phase 3 
space and v' is any other distribution, which occurs if any one of the phases fails. Since the distance 
to stationarity in phase 3 is e, 1 1 ^3 — vr3 1 1 < e, where ir?, is the stationary distribution on the state 
space of phase 3. In Lemma I A. 131 we show that ^(x) = 7r(x)/(l — w) where w = "Y^-j—x 1 k(x). 
Since ir(x) is exponentially small in this range, w is exponentially small in n. Now use the triangle 
inequality to find 

11^3 — tt|| < I |f 3 - vr 3 || + 1 1 vr 3 - 7r||. (5.9) 
Since the chain in phase 3 has mixed to e, the first term is < e. We can evaluate 1 1 -7T3 — 7r||: 

1 n 



F3 



2 

x=l 



2 



i / " n/2_1 n 

\ x=l x=9n/2 

— (w + 1 — (1 — w)) = w. 



So now, 



||(1 ~Pf)v 3 +Pfv' - 7r|| = ||(1 ~Pf)(v3 - 7r) +Pf(v' - 7r)|| 

< (1 -Pf)\\v 3 - tt\\ +Pf\\v' - tt\\ 

< (i-P/Xe + w) +Pf 

< s 

where 5 = e + w + pf. We are free to choose e: choose it to be 1/n so that 5 is 1/ poly(re). So now 
the running time to get a distance 6 is t = O(relogn). We then apply Lemma lA. 151 to obtain the 
result. 
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This concludes the proof of Theorem 15.41 so Corollary 15.61 is proved. 



□ 



We have now proven Lemma 12.111 and consequently Corollary 12.121 We now show how Theorem 
OB follows. 



6 Main Result 

We will now show how the mixing time results imply that we have an approximate 2-design. 



Proof of Theorem \2.1Ck We will go via the 2-norm since this gives a tight bound when working 
with the Pauli operators. The supremum can be taken over just physical states p [29J. We write p 
in the Pauli basis as usual (as Eqn. 12 .3|) . 



\Gw - Gh\\1 = sw p\\(Gw <8>I)(p) - (Gh &> I){p)\\l 
p 



< 2 4n sup \\(G W ® I)(p) - {G H ®I)(p)\\l 



sup 

p 



^2 7o (pi ,P2,P3,P<i) (Gw (o- Pi ® cr p2 ) <g> o- ps ® a P4 



pi ,P2'P:i-Pi 
P1P27^ 00 



- Gh{o- Pi o P2 ) <S> ■ 
Now, write (for pip 2 / 00) Gw(jnO- pi <g> a P2 ) = ^ 



9i. 92 9t(qi,q2;pi,P2)o- qi 

9192^00 



o q% . We get 



sup 

p 



^2 n /o(puP2,P3,Pi) [9t(qi,q2;pi,P2 



Pl ,P2'P3'Pi>H ,92 
P1P2^00,H 927^0 



"9192 u PlP2 

2™ (2™ + 1) 



a qi ® cr, 



12 



a p3 ® (J; 



P4 



^9l92^PlP2 



= 2 4n sup ^ 7o(pi.P2,P3,P4) ( 9t(qi,q2-,Pi,P2) . 2 „ ( . 2 „ + J 

PlP2^00,9l 927^00 

<2 4n sup ^ 7o(Pi^2,P3,P4)e 2 

^ P1,P2>P3'P4 
P1P2#°° 

< 2 4 V 

where the first equality comes from the orthogonality of the Pauli operators under the Hilbert- 
Schmidt inner product and the last inequality comes from the fact that p is a physical state so has 
trp 2 < 1. This proves the result for the diamond norm, Definition 12.51 For the distance measure 
defined in Definition [2l)J the argument in [10] can be used together with the 1-norm bound to prove 
the result. □ 
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It is unfortunate that there is still a dimension factor remaining in the above proof. To get a 
distance e we have to run the random circuit for 0(n(n + logl/e)) steps. However, closeness in 
the diamond-norm may be too stringent a requirement. After 0(n(n + log 1/e)) steps, the random 
circuit gives a 2-design in the measure used by Dankert et al. (see [10] and Definition 12. 6ft . This is 
in contrast to the 0{n log 1/e) steps required by the explicit circuit construction of Dankert et al. 

7 Conclusions 

We have proved tight convergence results for the first two moments of a random circuit. We have 
used this to show that random circuits are efficient approximate 1- and 2-unitary designs. Our 
framework readily generalises to /c-designs for any k and the next step in this research is to prove 
that random circuits give approximate /c-designs for all k. 

We have shown that, provided the random circuit uses gates from a universal gate set that is also 
universal on U(4), the circuit is still an efficient 2-design. We also see that the random circuit with 
gates chosen uniformly from U(4) is the most natural model. We note that the gates from f7(4) 
can be replaced by gates from any approximate 2-design on two qubits without any change to the 
asymptotic convergence properties. 

One application of this work is to give an efficient method of decoupling two quantum systems by 
applying a random unitary from a 2-design to one system and then discarding part of it. This 
technique is used in [2] to construct a variety of encoding circuits for tasks in quantum Shannon 
theory; thus, we (like [TO]) reduce the encoding complexity in [2] (and related works, such as [2~T] ) 
to 0(n 2 ). Unfortunately, the decoding circuits still remain inefficient. 

An algorithmic application of random circuits was given in [19] , where they were used to construct 
a new class of superpolynomial quantum speedups. In that paper, random circuits of length 0(n 3 ) 
were used in order to guarantee that they were so-called "dispersing" circuits. Our results imme- 
diately imply that circuits of length 0{n 2 ) would instead suffice. We believe that this could be 
further improved with a specialised argument, since [19] assumed that the input to the random 
circuit was always a computational basis state. 

Another potential application of random circuits is to model the evolution of black holes [22]. In 
Ref. [22J, they conjecture that short random local quantum circuits are approximately 2-designs, 
and thus can be used for decoupling quantum systems (as in [2]). This, in turn, is used to make 
claims about the rate at which black holes leak information. While our model differs from that of 
Ref. [22j in that they consider nearest-neighbour interactions and we do not, our techniques and 
results could be readily extended to cover the case they consider. 

Finally, random circuits are interesting physical models in their own right. The original purpose 
of [26] was to answer the physical question of how quickly entanglement grows in a system with 
random two party interactions. Lemma l2.11( i) shows that 0(n(n+log 1/e)) steps suffice (in contrast 
to 0(n 2 (n + log 1/e)) which they prove) to give almost maximal entanglement in such a system. 
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A Appendix 

A.l Permutation Operators 

The following theorems about permutation operators will be used repeatedly. 
Lemma A.l. Let C be a cycle of length c in S c . Then 

tr (C (Ai ® A 2 ® . . . ® A c )) = tr (A C(1 )A C o2 {1) A C o3 (1) . . . A x ) . 

Proof. We have 

tr (C(Ai® A 2 ®...® A c )) = ^ (V2---i c |C(Ai® A 2 ® ...® A 

11,12,— ,«c 

= X] ( i ll j4 C(l)Nc(l))( i 2|A c . (2 )|ic(2)> • • • (ic\Ac( c )\ic(c)) 
U,i2,— ,i c 

= X] ^il j4 c(i)Nc(i))( i c(i)|Ac°2(i)|i c -°2 {1 )) . . . (i C o C -i(X)\Ai\ii) 

U,12,— ,«c 

since C oc (l) = 1. Evaluate the sum using the resolution of the identity to get the result. □ 

With this we can work out the Pauli expansion of the swap operator: 

Lemma A. 2. The swap operator T on two d dimensional systems can be written as 

-}^o-p®o-p. 
p 

where {<7 p } form a Hermitian orthogonal basis with trcr^ = d. 
Proof. Expand J- in the basis and use Lemma |A. 11 

tr Up ® <Jq T = tr GpOq 

= id p = q 
1 otherwise. 

The given sum has the correct coefficients in the basis therefore ^ ^ p a p o~ p = T . □ 

A. 2 Zero chain mixing time proofs 
A. 2.1 Asymmetric Simple Random Walk 

We will use some facts about asymmetric simple random walks i.e. a random walk on a ID line 
with probability p of moving right at each step and probability q = 1 — p of moving left. 

The position of the walk after k steps is tightly concentrated around k(p — q): 
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Lemma A. 3. Let X k be the random variable giving the position of a random walk after k steps 
starting at the origin with probability p of moving right and probability q = 1 — p of moving left. 
Let fi = p — q. Then for any r] > 0, 

F{Xk > [ik + rj) < exp f-^j- 

and 

F(X k < iik - r?) < exp (-1- 

Proof. The standard Chernoff bound for 0/1 variables Yi gives, with Yi equal to 1 with probability 
p and for Y k = Y% =1 %, 

F(Y k >kp + 7])< exp (y-^Y 
P(n < kp-r)) < exp f-^- 
For our case, set Yi = 2Xi — 1 to give the desired result. □ 

This result is for a walk with constant bias. We will need a result for a walk with varying (but 
bounded from below) bias: 

Lemma A. 4. Let X k be the random variable giving the position of a random walk after k steps 
starting at the origin with probability Pi > p of moving right and probability q% <P of moving left 
at step i. Let fi = p — (1 — p). Then for any rj > 0, 

r(X k > fik + V )<exp (-^j- 

and 

nX k < l^k - n) < exp {-^ 

Proof. Let % be a random variable equal to 1 with probability p and with probability 1 — p. 
Then let Z$ be a random variable equal to 1 with probability pi and with probability 1 — p^. 
Let Y k = Yli=i Y and Z k = Yli=i ^i- Then following the standard Chernoff bound derivation (for 
A>0), 



F(Z k >kp + r])=F (e XZk > e x(kp+Tl ^ 



\(kp+ri) 
< - 

- Ee xz * 

e X(kp+ri) 



< 



Ee XY * 



We can then, as above, set Zi = 2Xi — 1. The calculation is similar for the bound on F(X k < 
jik — rf). □ 
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From Lemma lA.31 we can prove a result about how often each site is visited. If the walk runs for t 
steps the walk is at position t[i with high probability so we might expect from symmetry that each 
site will have been visited about times. Below is a weaker concentration result of this form but 
is strong enough for our purposes. It says that the amount of time spent < x is about xj \i. 

Lemma A. 5. For 7 > 2 and integer x > 0, 



f^I(X k < x) > -yx/^j < 2exp (- M7 2 2) ) 



\fc=i 

where I is the indicator function. 

Proof. Let Yk = I(Xk < x). From Lemma lA.31 

(k/j, — xY 



F(n = 0) < exp 
for k < x/fj, and 

¥{Y k = 1) < exp 



2k 



(k/j, — x)' 
2k 



for k > x/fj,. 

Then the quantity to evaluate is 

We use a standard trick to split this into two mutually exclusive possibilities and then bound the 
probabilities separately. Write 



\k=l 
/ 00 



(fx/n \\ / / 00 \ (l^jix 

n K=i]jj+pME y ^w^jn( u [ y i=°]) )• (a- 1 ) 

We can bound the first term: 

< 00 \ / jx/fi \ \ / yx/n 

< P (>W„ = !) 

/ix(7 - if 



< exp 

< exp 



2 7 

/ix(7 - 2) 
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The second term similarly: 




oo \ / yx/n 

e^>7^ n u k=°] 
k=i i \j=i 




□ 

The last fact we need about asymmetric simple random walks is a bound on the probability of going 
backwards. If p > q then we expect the walk to go right in the majority of steps. The probability 
of going left a distance a is exponentially small in a. This is a well known result, often stated as 
part of the gambler's ruin problem: 

Lemma A. 6 (See e.g. |17j). Consider an asymmetric simple random walk that starts at a > and 
has an absorbing barrier at the origin. The probability that the walk eventually absorbs at the origin 
is 1 if p < q and (q/p) a otherwise. 

This result is for infinitely many steps. If we only consider finitely many steps, the probability of 
absorption must be at most this. 



A. 2. 2 Waiting Time 

From above we saw that the probability of moving is at least 2x/5n when at position x. The length 
of time spent waiting at each step is therefore stochastically dominated by a geometric distribution 
with parameter 2x/5n. The following concentration result will be used to bound the waiting time 
(in our case j3 = 2/5): 

Lemma A. 7. Let the waiting time at each site be W(x) ~ Geo{(3x/n), the total waiting time 
W = £*=i W{x) and t' = Then 

F{W > Ct') < 2t {1 - c)/2 . 



Proof. By Markov's inequality for A > 0, 



E XW 



The W(x) are independent so 

t 

\W(x) 



Ee xw = J] Ee^ 



x=l 
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Summing the geometric series we find 

§x 

Ee xw{x) = 2 _ 

provided e A < — ^ for all 1 < x < t. Therefore e A is of the form — ^ where < a < 1. With 

n 7i 

this, 

Ee w W = 

x — a 

and 

Ee AW _ t\T(l - a) 



T(t + 1- a) 

We are free to choose a within its range to optimise the bound. However, for simplicity, we will 
choose a = 1/2. From Lemma [A. 141 

Ee xw < 2Vi. 

The result follows, using the inequality 1 — x < e~ x . □ 
A.2.3 Phase 1 

Here we prove that phase 1 completes successfully with high probability. The bias here is large so 
the walk moves right every time with high probability: 

Lemma A. 8. The probability that the accelerated chain moves right at each step, starting from 
x = 1 for t steps, is at least 

1 - t 2 /n. 

Proof. The probability of moving right at each step is 

3{n-x) (n-2)(n-3)...{n-t) 



n 



L 3n - 2x - 1 (n - 5/3) (n - 7/3) . . . (n - (2t + l)/3) 

> (1 - 2/n)(l - 3/n) ... (1 - t/n) 

> (l-t/n) 1 >l-t 2 /n □ 

Let t = n & . Provided 5 < 1/2 this probability is close to one. Therefore, with high probability, the 
walk moves to ra" 5 in n s steps. Using Lemma lA.7l the waiting time can be bounded: 

Lemma A. 9. Let be the waiting time during phase 1. Let H be the event that the walk moves 
right at each step. Then 

P (V« > Ct'\H\ < 2n s ^- c ^ 2 (A.2) 

where t' = M"d™. 

Proof. This follows directly from Lemma lA.71 since each site is hit exactly once. □ 
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We now combine these two lemmas to prove that phase 1 completes successfully with high proba- 
bility: 

Proof of Lemma ] 5. 81 In Lemma lA.81 we show that in accelerated steps, the walk moves right 
at each step with probability > 1 — n 25 ^ 1 . Call this event H. Then F(H) > 1 — n 25 ^ 1 . Lemma lA9l 
shows that the waiting time is bounded with high probability (choosing C = 3): 

F(W W < Ihn5\nn/2\H) > 1 - 2n~ s . 

Then we can bound the probability of phase 1 completing successfully: 

P(Phase 1 completes successfully) > F(H n W W < 15n<51nn/2) 

= P(if)P(W (1) < I5n5lnn/2\H) 

> (1 - n 25 - l ){l - 2n- S ) 

> 1 - n 26 - 1 - 2n- 5 . □ 

A.2.4 Phase 2 

Phase 2 starts at n s /2 and finishes when the walk has reached On for some constant < 9 < 3/4. 
We show that, with high probability, this also takes time O(nlogn). The probability of moving 
right during this phase is at least p = g„ 2 6> • ^ e ^ rs ^ define some constants that we will derive 
bounds in terms of. Let 7 be a constant > 2. Let \i = p — (1 — p) and fi = fi/j. Finally let s = fit 
for some t (which will be the number of accelerated steps). Then, with high probability, the walk 
will have passed s after t steps: 

Lemma A. 10. Let Xt be the position of the walk at accelerated step t, where Xo = n 5 . Then 

F(X t <s)< exp(-/x 2 t(l - l/ 7 ) 2 /2)- 
Proof. Let X[ = Xt — n s . Then from Lemma I A. 41 

F{X[ < pi - 77) < exp (-70- 



Now let 77 = fit — s and use 



V(X t <s)= F(X' t < s 



n 



< nK < s) 

to complete the proof. □ 
We now prove a bound on the waiting time: 

Lemma A. 11. Let be the waiting time in phase 2. Then, assuming the walk does not go back 
beyond n <5 /2, 



15nlns\ , , , N ,/o„ 2exp 



-/in" 



P W® > < (4/ S )3/^ + ^ * . (A.3) 

/J / l-exp(^) 



40 



Proof. Let Wk ~ Geo (^§f^J where is the position of the walk at accelerated step k (Xq = n 5 ). 
We want to bound (w.h.p.) the waiting time = Yl\=i Wk of t steps of the accelerated walk. 
Define the event H to be 



H = < 



n 

K X>n s /2 



J^KX k <x)< x/Pl 



.k=l 



(A.4) 



If H occurs, no sites have been hit too often and the walk has not gone back further than n s /2. It 
is important that we also use the restriction that Xk > n s /2 because the waiting time grows the 
longer the walk moves back. However, it is very unlikely that the walk will go backwards (even to 
n s /2). 

We now define some more notation to bound the waiting time. Let X = (X\, X2, ■ ■ ■ , X t ) be a 
tuple of positions and let N X (X.) be the number of times that x appears in X and let N(X) = 
(AT 1 (X),7V 2 (X),...,iV n (X)). Then we have E.A^X) = t. 

As we said above, the waiting time at x = a stochastically dominates the waiting time at x = b for 
b > a. In other words, 

W k > W k ' if X k < X k ' (A.5) 
where X > Y means that X stochastically dominates Y. Now write the waiting time for all steps 



wW(x) = Y,Wk 

k=l 

N X (X) 

= E E w »w 

x h=l 

where W h (x) ~ Geo (§f). 

If event H occurs, we can put some bounds on N x . We find that, for all x > n s /2, 

X 

£ Ny(X) K X/fl 

y=n s /2 



(A.6) 



(A.7) 



and iV x (X) = for x < n s /2. Now let X m be such that N n s /2 (X m ) = and N x (X m ) = 1//2 for 
x > n 5 /2. Then 



^2 N y(^ra) = x/fl. 
y=n s /2 

Now we introduce the relation ^: 

Definition A. 12. Let x and y be n-tuples. Then x <y if 

k k 

E x ^E^ 



(A.8) 



(A.9) 



i=i 



i=i 



for all 1 < k < n with equality for k = n. 



41 



Note that this is like majorisation, except the elements of the tuples are not sorted. Using this, we 
find that N(X) * N(X m ) (Using Ey-^( X ) = ^2 y N yi X ') = * for a11 X,X'.) 

If we combine Equations \KE and EH we find that W^(X) > W®{X.') if N(X) y N(X'). Roughly 
speaking, this is simply saying that the waiting time is larger if the earlier sites are hit more often. 
But since for all X that satisfy H, X ^ X m , we have W^(X) < W^(X m ) provided H occurs. 
We will simplify further by noting that X m X Xo where N x (X.q) = 1//2 for 1 < x < jit = s and 
zero elsewhere. Therefore 



(V^(X) > 



5Cn Ins 



2/i 



h) < P (V (2) (X ) > 



5Cn Ins 
2/2 



We can bound this by applying Lemma [A. 71 Let Wh = Ylx=i Wh{x). From Lemma [A. 7\ 



l-C 



P(W h > Ct') < 2s~ 



(A.10) 



where t' = 5n ^ n s . However, we want a bound on P {^/h=i Wh > Ct'/fl\ . The same reasoning as in 



Lemma lA. 71 bounds this as 



l/ji 



Y,W h >Ct'/ii < (2s 



l-C 
2 



(A.ll) 



,/i=l 



Therefore 



w{2) 5Cnlns^ < ^ ^ 

2/i 



(A.12) 



To complete the proof, we just need to find F(H C ). We can bound it using the union bound and 
Lemma IA.5I 



F(H C 



U 

K x=n s /2 



J2KX k < x )> x/fl 



,k=l 
oo 



< Y V[51 I (Xk<x)>x/ji 

x=n s /2 \k=l / 

< Y 2exp^^ (7 - 2) ' 

x=n s /2 
oo 

< y 2ex p 

x=n s /2 



-/xx(7 - 2) 



2 exp 



-M«'(7~2) 
4 



1 — exp 
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Now, for any events A and B 

F{A) = F{A HB) + F(A n B c ) 

= F{A\B)F{B) + F{Ar\B c ) 
< F(A\B) + F(B C ) 

and set C = 2 and 7 = 3 to obtain the result. □ 

We now combine these two lemmas to prove that phase 2 completes successfully with high proba- 
bility: 

Proof of Lemma 15, .91 Phase 2 can fail if: 

• The walk does not reach On. The probability of this is bounded by Lemma lA.101 

F(X t < On) < exp ( — /lOn) . 



3' 



This follows from setting t = and 7 = 3 



• The waiting time is too long. This probability is bounded by Lemma lA.lll 

m f„ r m 15nln(0n)\ /4\* 2exp (^) 

P W ^ > y —L < — + \ ^ + (q p) n I 2 . 

\ V ) \ 6n J l-exp(-/x/2) 

• The walk gets back to n s /2. This is bounded by Lemma lA.6t 

P (Walk gets to n s /2^ < {q/pf* 12 ■ 

So, using the union bound we can bound the overall probability of failure: 



( 2 \ / 4\l 2 exp 1 

P(Phase 2 fails) < exp --fiOn + — + ^ + (q/p) n /2 . □ 

\ 3 / \tmy 1 — exp(— /i/2) 

A.2.5 Phase 3 

This phase starts at On. We show that this mixes quickly using log-Sobolev arguments. 

Lemma A. 13. The zero chain on the restricted state space x G [m, n] where m = On/2 for 
< < 3/4 has mixing time O (nlog j) . 

Proof. We restrict the Markov chain to only run from m by adjusting the holding probability at 
m, P(m,m). Construct the chain P' with transition matrix 

x < m or y < m 

P'(x,y) = < 1 - P(m,m + 1) x = y = m (A.13) 
^P(x,y) otherwise 



T<5 



43 



where P is the transition matrix of the full zero chain. This chain then has stationary distribution 

,'( X ) = H X)/{1 - W) m ^ x ^ n (A.14) 
otherwise 



where w = Yl™=i 7r ( a; )- To see this, first note that the distribution is normalised. We want to show 
that 

n 

Y^P'^,yW(x)=7T'(y). (A.15) 

x=m 

When y = m we are required to prove that P'(m, m)Tr'(m) + P'(m + 1, m)ir'(m + 1) = Tr'(m). This 
follows from the reversibility of the unrestricted zero chain, using P'(m, m) = 1 — P(m, m + 1). For 
y > m, Eqn. IA.15I is satisfied simply because ir(x) is the stationary distribution of P and related 
by a constant factor to tt'(x). 

We can now prove this final mixing time result, making use of Lemma 14.101 Let Qi be the chain 
that uniformly mixes site %. This converges in one step and has a log-Sobolev constant independent 
of n; call it p\. Let Q be the chain that chooses a site at random and then uniformly mixes that 
site. This is the product chain of the Qi so, by Lemma |4.10[ has gap 1/n and Sobolev constant 
Pq = Pi/ n - We can construct the zero chain for this and find its Sobolev constant. 

The Sobolev constant is defined (Definition I4.8P in terms of a minimisation over functions on the 
state space. For the chain Q we can write 

PQ = inf f(4>). 

<p 

If we restrict the infimum to be over functions (j) with <j){x) = <p{y) for x and y containing the same 
number of zeroes then we obtain the Sobolev constant for the zero-Q chain, pq , which is chain 
which counts the number of zeroes in the full chain Q. Since taking the infimum over less functions 
cannot give a smaller value, 

PQ > PQ> Pl/n. 

We can now compare this chain to the zero-P chain. The stationary distributions are the same. 
The transition matrix for the zero-Q chain is 



Qo{x,y) 



n+2x 
An 

x 

in 

3(n— x) 
4n 



y = x 
y = x — 1 

y = X + 1 

otherwise 



Then construct Q' by restricting the space to only run from m in exactly the same was as P' 
is constructed from P. Q' Q has the same stationary distribution as P'. Now we can perform the 
comparison. From Eqn. I4.14t 

Q' (a,a+l) 

A = max — j- 

a>m P (a, a + 1) 

5(n-l) 5 
= max < — . 

a>m 8a 80 

Therefore ppi > ^f^ 1 - Exactly the same argument applies to show the gap is 0(l/n) so the mixing 
time is (from Eqn. 14.16]) 0(n log-). □ 
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Now we can prove that phase 3 completes successfully with high probability: 

of Lemma \5. 1(A In Lemma I A. 131 we show that after O (nlog — J steps the chain mixes to distance 
e. We just need to show that the walk goes back to On/2 with small probability This follows from 
Lemma IA.6I □ 

A. 3 Moment Generating Function Calculations 

The following lemma is needed in the moment generating function calculations. 
Lemma A. 14. For Integer s > 0, 

r( s + i)r(i/2) 

T(s + 1/2) " 2V ° (A - 16) 
Proof. Prom expanding the T functions, Eqn. I A. 161 becomes 

s\2 s 2 x 4 x 6 x ... x 2(s - 1) x 2s 



(2s - 1)!! 1 x 3 x 5 x ... x (2s - 3) x (2s - 1) 
2x 



n 



2x - 1 

x=l 

We then proceed by induction. njc=i 2x-i = 2 anc ^ ^ * ne inductive hypothesis 

TT 2x < 2 (^ + !) o /- 
ll2^T-2(s + l)-l Vs - 

It is easy to show that 2 (l+i)-i — \J an d the result follows. □ 

A. 4 Mixing Times 

We find bounds for the mixing time above that are valid with high probability. Below we turn 
these into full mixing time bounds. 

Lemma A. 15. // after 0{n log n) steps the state v of a random walk satisfies 

\\v — tt\\ < 5 

where it is the stationary distribution and 5 is l/poly(n) then the number of steps required to be at 
most a distance e from stationarity is 

O I n log ■ 



Proof. Let s be the slowest mixing initial state. Then, after t = 0{n log n) steps we have at worst 
the state 

(1 - 5)ir + 5s 
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and if we repeat kt times S becomes S k . So to get a distance e, k 
Now we evaluate the mixing time: 



loge 
log? 



kt = 0(n log n) 



loge 
log 5 



0(n log n) 



log 1/e 



log 1/5 

= 0(n max(logra, log 1/e)) 

= O I n log ■ 



□ 
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